Predicting speech intelligibility in conditions with nonlinearly processed noisy speech

نویسندگان

  • Søren Jørgensen
  • Torsten Dau
چکیده

The speech-based envelope power spectrum model (sEPSM; [1]) was proposed in order to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII). The sEPSM applies the signal-tonoise ratio in the envelope domain (SNRenv), which was demonstrated to successfully predict speech intelligibility in conditions with nonlinearly processed noisy speech, such as processing with spectral subtraction. Moreover, a multiresolution version (mr-sEPSM) was demonstrated to account for speech intelligibility in various conditions with stationary and fluctuating interferers [2]. However, the model fails in the case of phase jitter distortion, in which the spectral structure of speech is affected but the temporal envelope is maintained. This suggests that an across audio-frequency mechanism is required to account for this distortion. It is demonstrated that a measure of the across audio-frequency variance at the output of the modulation-frequency selective process in the model is sufficient to account for the phase jitter distortion. Thus, a joint spectro-temporal modulation analysis, as proposed in [3], does not seem to be required. The results are consistent with concepts from computational auditory scene analysis and further support the hypothesis that the SNRenv is a powerful metric for speech intelligibility prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of intelligibility of noisy and time-frequency weighted speech based on mutual information between amplitude envelopes

This paper deals with the problem of predicting the average intelligibility of noisy and potentially processed speech signals, as observed by a group of normal hearing listeners. We propose a prediction model based on the hypothesis that intelligibility is monotonically related to the the amount of Shannon information the critical-band amplitude envelopes of the noisy/processed signal convey ab...

متن کامل

SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech

Most of the existing intelligibility measures do not account for the distortions present in processed speech, such as those introduced by speech-enhancement algorithms. In the present study, we propose three new objective measures that can be used for prediction of intelligibility of processed (e.g., via an enhancement algorithm) speech in noisy conditions. All three measures use a critical-ban...

متن کامل

The characterization of the relative information content by spectral features for the objective intelligibility assessment of nonlinearly processed speech

The objective intelligibility assessment of nonlinearly enhanced speech is a widely experienced problem. Nonlinear speech enhancement processors operate primarily on the low-level and transient components of speech. As these sections contain important acoustic cues as well as context-constitutive information, they dominate speech intelligibility. For that reason, shorttime intelligibility measu...

متن کامل

Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs

Automatic prediction of speech intelligibility is highly desirable in the speech research community, since listening tests are timeconsuming and can not be used online. Most of the available objective speech intelligibility measures are intrusive methods, as they require a clean reference signal in addition to the corresponding noisy/processed signal at hand. In order to overcome the problem of...

متن کامل

Evaluation of Objective Intelligibility Prediction Measures for Speech Enhancement in Mandarin

In this paper, we evaluate the performance of several state-of-the-art objective measures in terms of predicting speech intelligibility in Mandarin of the processed noisy signals by speech enhancement algorithms. The speech signals were first corrupted by three types of noises at two signal-to-noise ratios, followed by four classes of speech enhancement algorithms. The objective intelligibility...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017